469 research outputs found
A classification procedure for the effective management of changes during the maintenance process
During software operation, maintainers are often faced with numerous change requests. Given available resources such as effort and calendar time, changes, if approved, have to be planned to fit within budget and schedule constraints. In this paper, we address the issue of assessing the difficulty of a change based on known or predictable data. This paper should be considered as a first step towards the construction of customized economic models for maintainers. In it, we propose a modeling approach, based on regular statistical techniques, that can be used in a variety of software maintenance environments. The approach can be easily automated, and is simple for people with limited statistical experience to use. Moreover, it deals effectively with the uncertainty usually associated with both model inputs and outputs. The modeling approach is validated on a data set provided by NASA/GSFC which shows it was effective in classifying changes with respect to the effort involved in implementing them. Other advantages of the approach are discussed along with additional steps to improve the results
Reinforcement Learning for Test Case Prioritization
Continuous Integration (CI) significantly reduces integration problems,
speeds up development time, and shortens release time. However, it also
introduces new challenges for quality assurance activities, including
regression testing, which is the focus of this work. Though various approaches
for test case prioritization have shown to be very promising in the context of
regression testing, specific techniques must be designed to deal with the
dynamic nature and timing constraints of CI.
Recently, Reinforcement Learning (RL) has shown great potential in various
challenging scenarios that require continuous adaptation, such as game playing,
real-time ads bidding, and recommender systems. Inspired by this line of work
and building on initial efforts in supporting test case prioritization with RL
techniques, we perform here a comprehensive investigation of RL-based test case
prioritization in a CI context. To this end, taking test case prioritization as
a ranking problem, we model the sequential interactions between the CI
environment and a test case prioritization agent as an RL problem, using three
alternative ranking models. We then rely on carefully selected and tailored
state-of-the-art RL techniques to automatically and continuously learn a test
case prioritization strategy, whose objective is to be as close as possible to
the optimal one. Our extensive experimental analysis shows that the best RL
solutions provide a significant accuracy improvement over previous RL-based
work, with prioritization strategies getting close to being optimal, thus
paving the way for using RL to prioritize test cases in a CI context
An Experimental Scrutiny of Visual Design Modelling: VCL up against UML+OCL
The graphical nature of prominent modelling notations, such as the standards UML and SysML, enables them to tap into the cognitive benefits of diagrams. However, these notations hardly exploit the cognitive potential of diagrams and are only partially graphical with invariants and operations being expressed textually. The Visual Contract Language (VCL) aims at improving visual modelling; it tries to (a) maximise diagrammatic cognitive effectiveness, (b) increase visual expressivity, and (c) level of rigour and formality. It is an alternative to UML that does largely pictorially what is traditionally done textually. The paper presents the results of a controlled experiment carried out four times in different academic settings and involving 43 participants, which compares VCL against UML and OCL and whose goal is to provide insight on benefits and limitations of visual modelling. The paper's hypotheses are evaluated using a crossover design with the following tasks: (i) modelling of state space, invariants and operations, (ii) comprehension of modelled problem, (iii) detection of model defects and (iv) comprehension of a given model. Although visual approaches have been used and advocated for decades, this is the first empirical investigation looking into the effects of graphical expression of invariants and operations on modelling and model usage tasks. Results suggest VCL benefits in defect detection, model comprehension, and modelling of operations, providing some empirical evidence on the benefits of graphical software design
Effective Removal of Operational Log Messages: an Application to Model Inference
Model inference aims to extract accurate models from the execution logs of
software systems. However, in reality, logs may contain some "noise" that could
deteriorate the performance of model inference. One form of noise can commonly
be found in system logs that contain not only transactional messages---logging
the functional behavior of the system---but also operational
messages---recording the operational state of the system (e.g., a periodic
heartbeat to keep track of the memory usage). In low-quality logs,
transactional and operational messages are randomly interleaved, leading to the
erroneous inclusion of operational behaviors into a system model, that ideally
should only reflect the functional behavior of the system. It is therefore
important to remove operational messages in the logs before inferring models.
In this paper, we propose LogCleaner, a novel technique for removing
operational logs messages. LogCleaner first performs a periodicity analysis to
filter out periodic messages, and then it performs a dependency analysis to
calculate the degree of dependency for all log messages and to remove
operational messages based on their dependencies. The experimental results on
two proprietary and 11 publicly available log datasets show that LogCleaner, on
average, can accurately remove 98% of the operational messages and preserve 81%
of the transactional messages. Furthermore, using logs pre-processed with
LogCleaner decreases the execution time of model inference (with a speed-up
ranging from 1.5 to 946.7 depending on the characteristics of the system) and
significantly improves the accuracy of the inferred models, by increasing their
ability to accept correct system behaviors (+43.8 pp on average, with
pp=percentage points) and to reject incorrect system behaviors (+15.0 pp on
average)
Evaluating Model Testing and Model Checking for Finding Requirements Violations in Simulink Models
Matlab/Simulink is a development and simulation language that is widely used
by the Cyber-Physical System (CPS) industry to model dynamical systems. There
are two mainstream approaches to verify CPS Simulink models: model testing that
attempts to identify failures in models by executing them for a number of
sampled test inputs, and model checking that attempts to exhaustively check the
correctness of models against some given formal properties. In this paper, we
present an industrial Simulink model benchmark, provide a categorization of
different model types in the benchmark, describe the recurring logical patterns
in the model requirements, and discuss the results of applying model checking
and model testing approaches to identify requirements violations in the
benchmarked models. Based on the results, we discuss the strengths and
weaknesses of model testing and model checking. Our results further suggest
that model checking and model testing are complementary and by combining them,
we can significantly enhance the capabilities of each of these approaches
individually. We conclude by providing guidelines as to how the two approaches
can be best applied together.Comment: 10 pages + 2 page reference
LTM: Scalable and Black-box Similarity-based Test Suite Minimization based on Language Models
Test suites tend to grow when software evolves, making it often infeasible to
execute all test cases with the allocated testing budgets, especially for large
software systems. Therefore, test suite minimization (TSM) is employed to
improve the efficiency of software testing by removing redundant test cases,
thus reducing testing time and resources, while maintaining the fault detection
capability of the test suite. Most of the TSM approaches rely on code coverage
(white-box) or model-based features, which are not always available for test
engineers. Recent TSM approaches that rely only on test code (black-box) have
been proposed, such as ATM and FAST-R. To address scalability, we propose LTM
(Language model-based Test suite Minimization), a novel, scalable, and
black-box similarity-based TSM approach based on large language models (LLMs).
To support similarity measurement, we investigated three different pre-trained
language models: CodeBERT, GraphCodeBERT, and UniXcoder, to extract embeddings
of test code, on which we computed two similarity measures: Cosine Similarity
and Euclidean Distance. Our goal is to find similarity measures that are not
only computationally more efficient but can also better guide a Genetic
Algorithm (GA), thus reducing the overall search time. Experimental results,
under a 50% minimization budget, showed that the best configuration of LTM
(using UniXcoder with Cosine similarity) outperformed the best two
configurations of ATM in three key facets: (a) achieving a greater saving rate
of testing time (40.38% versus 38.06%, on average); (b) attaining a
significantly higher fault detection rate (0.84 versus 0.81, on average); and,
more importantly, (c) minimizing test suites much faster (26.73 minutes versus
72.75 minutes, on average) in terms of both preparation time (up to two orders
of magnitude faster) and search time (one order of magnitude faster)
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems
Deep Neural Networks (DNNs) have been widely used to perform real-world tasks
in cyber-physical systems such as Autonomous Diving Systems (ADS). Ensuring the
correct behavior of such DNN-Enabled Systems (DES) is a crucial topic. Online
testing is one of the promising modes for testing such systems with their
application environments (simulated or real) in a closed loop taking into
account the continuous interaction between the systems and their environments.
However, the environmental variables (e.g., lighting conditions) that might
change during the systems' operation in the real world, causing the DES to
violate requirements (safety, functional), are often kept constant during the
execution of an online test scenario due to the two major challenges: (1) the
space of all possible scenarios to explore would become even larger if they
changed and (2) there are typically many requirements to test simultaneously.
In this paper, we present MORLOT (Many-Objective Reinforcement Learning for
Online Testing), a novel online testing approach to address these challenges by
combining Reinforcement Learning (RL) and many-objective search. MORLOT
leverages RL to incrementally generate sequences of environmental changes while
relying on many-objective search to determine the changes so that they are more
likely to achieve any of the uncovered objectives. We empirically evaluate
MORLOT using CARLA, a high-fidelity simulator widely used for autonomous
driving research, integrated with Transfuser, a DNN-enabled ADS for end-to-end
driving. The evaluation results show that MORLOT is significantly more
effective and efficient than alternatives with a large effect size. In other
words, MORLOT is a good option to test DES with dynamically changing
environments while accounting for multiple safety requirements
Automated, Cost-effective, and Update-driven App Testing
Apps' pervasive role in our society led to the definition of test automation
approaches to ensure their dependability. However, state-of-the-art approaches
tend to generate large numbers of test inputs and are unlikely to achieve more
than 50% method coverage. In this paper, we propose a strategy to achieve
significantly higher coverage of the code affected by updates with a much
smaller number of test inputs, thus alleviating the test oracle problem. More
specifically, we present ATUA, a model-based approach that synthesizes App
models with static analysis, integrates a dynamically-refined state abstraction
function and combines complementary testing strategies, including (1) coverage
of the model structure, (2) coverage of the App code, (3) random exploration,
and (4) coverage of dependencies identified through information retrieval. Its
model-based strategy enables ATUA to generate a small set of inputs that
exercise only the code affected by the updates. In turn, this makes common test
oracle solutions more cost-effective as they tend to involve human effort. A
large empirical evaluation, conducted with 72 App versions belonging to nine
popular Android Apps, has shown that ATUA is more effective and less effort
intensive than state-of-the-art approaches when testing App updates
Security slicing for auditing XML, XPath, and SQL injection vulnerabilities
XML, XPath, and SQL injection vulnerabilities are among the most common and serious security issues for Web applications and Web services. Thus, it is important for security auditors to ensure that the implemented code is, to the extent pos- sible, free from these vulnerabilities before deployment. Although existing taint analysis approaches could automatically detect potential vulnerabilities in source code, they tend to generate many false warnings. Furthermore, the produced traces, i.e. data- flow paths from input sources to security-sensitive operations, tend to be incomplete or to contain a great deal of irrelevant infor- mation. Therefore, it is difficult to identify real vulnerabilities and determine their causes. One suitable approach to support security auditing is to compute a program slice for each security-sensitive operation, since it would contain all the information required for performing security audits (Soundness). A limitation, however, is that such slices may also contain information that is irrelevant to security (Precision), thus raising scalability issues for security audits. In this paper, we propose an approach to assist security auditors by defining and experimenting with pruning techniques to reduce original program slices to what we refer to as security slices, which contain sound and precise information. To evaluate the proposed pruning mechanism by using a number of open source benchmarks, we compared our security slices with the slices generated by a state-of-the-art program slicing tool. On average, our security slices are 80% smaller than the original slices, thus suggesting significant reduction in auditing costs
- …